Emojify!

Welcome to the second assignment of Week 2. You are going to use word vector representations to build an Emojifier.

Have you ever wanted to make your text messages more expressive? Your emojifier app will help you do that. So rather than writing:

"Congratulations on the promotion! Let's get coffee and talk. Love you!"

The emojifier can automatically turn this into:

"Congratulations on the promotion! 👍 Let's get coffee and talk. ☕️ Love you! ❤️"

Using word vectors to improve emoji lookups

What you'll build

  1. In this exercise, you'll start with a baseline model (Emojifier-V1) using word embeddings.
  2. Then you will build a more sophisticated model (Emojifier-V2) that further incorporates an LSTM.

Updates

If you were working on the notebook before this update...

List of updates

Let's get started! Run the following cell to load the package you are going to use.

1 - Baseline model: Emojifier-V1

1.1 - Dataset EMOJISET

Let's start by building a simple baseline classifier.

You have a tiny dataset (X, Y) where:

**Figure 1**: EMOJISET - a classification problem with 5 classes. A few examples of sentences are given here.

Let's load the dataset using the code below. We split the dataset between training (127 examples) and testing (56 examples).

Run the following cell to print sentences from X_train and corresponding labels from Y_train.

1.2 - Overview of the Emojifier-V1

In this part, you are going to implement a baseline model called "Emojifier-v1".

**Figure 2**: Baseline model (Emojifier-V1).

Inputs and outputs

One-hot encoding

Let's see what convert_to_one_hot() did. Feel free to change index to print out different values.

All the data is now ready to be fed into the Emojify-V1 model. Let's implement the model!

1.3 - Implementing Emojifier-V1

As shown in Figure 2 (above), the first step is to:

Run the following cell to load the word_to_vec_map, which contains all the vector representations.

You've loaded:

Run the following cell to check if it works.

Exercise: Implement sentence_to_avg(). You will need to carry out two steps:

  1. Convert every sentence to lower-case, then split the sentence into a list of words.
    • X.lower() and X.split() might be useful.
  2. For each word in the sentence, access its GloVe representation.
    • Then take the average of all of these word vectors.
    • You might use numpy.zeros().

Additional Hints

Expected Output:

avg =
[-0.008005    0.56370833 -0.50427333  0.258865    0.55131103  0.03104983
 -0.21013718  0.16893933 -0.09590267  0.141784   -0.15708967  0.18525867
  0.6495785   0.38371117  0.21102167  0.11301667  0.02613967  0.26037767
  0.05820667 -0.01578167 -0.12078833 -0.02471267  0.4128455   0.5152061
  0.38756167 -0.898661   -0.535145    0.33501167  0.68806933 -0.2156265
  1.797155    0.10476933 -0.36775333  0.750785    0.10282583  0.348925
 -0.27262833  0.66768    -0.10706167 -0.283635    0.59580117  0.28747333
 -0.3366635   0.23393817  0.34349183  0.178405    0.1166155  -0.076433
  0.1445417   0.09808667]

Model

You now have all the pieces to finish implementing the model() function. After using sentence_to_avg() you need to:

Exercise: Implement the model() function described in Figure (2).

$$ z^{(i)} = W . avg^{(i)} + b$$$$ a^{(i)} = softmax(z^{(i)})$$$$ \mathcal{L}^{(i)} = - \sum_{k = 0}^{n_y - 1} Y_{oh,k}^{(i)} * log(a^{(i)}_k)$$

Note It is possible to come up with a more efficient vectorized implementation. For now, let's use nested for loops to better understand the algorithm, and for easier debugging.

We provided the function softmax(), which was imported earlier.

Run the next cell to train your model and learn the softmax parameters (W,b).

Expected Output (on a subset of iterations):

**Epoch: 0** cost = 1.95204988128 Accuracy: 0.348484848485
**Epoch: 100** cost = 0.0797181872601 Accuracy: 0.931818181818
**Epoch: 200** cost = 0.0445636924368 Accuracy: 0.954545454545
**Epoch: 300** cost = 0.0343226737879 Accuracy: 0.969696969697

Great! Your model has pretty high accuracy on the training set. Lets now see how it does on the test set.

1.4 - Examining test set performance

Expected Output:

**Train set accuracy** 97.7
**Test set accuracy** 85.7

The model matches emojis to relevant words

In the training set, the algorithm saw the sentence

"I love you"

with the label ❤️.

Amazing!

Word ordering isn't considered in this model

Confusion matrix

What you should remember from this section

You will build a better algorithm in the next section!

2 - Emojifier-V2: Using LSTMs in Keras:

Let's build an LSTM model that takes word sequences as input!

Run the following cell to load the Keras packages.

2.1 - Overview of the model

Here is the Emojifier-v2 you will implement:


**Figure 3**: Emojifier-V2. A 2-layer LSTM sequence classifier.

2.2 Keras and mini-batching

Padding handles sequences of varying length

Example of padding

2.3 - The Embedding layer

Using and updating pre-trained embeddings

Inputs and outputs to the embedding layer

**Figure 4**: Embedding layer

Prepare the input sentences

Exercise:

Additional Hints

Run the following cell to check what sentences_to_indices() does, and check your results.

Expected Output:

X1 = ['funny lol' 'lets play baseball' 'food is ready for you']
X1_indices =
 [[ 155345.  225122.       0.       0.       0.]
 [ 220930.  286375.   69714.       0.       0.]
 [ 151204.  192973.  302254.  151349.  394475.]]

Build embedding layer

Exercise: Implement pretrained_embedding_layer() with these steps:

  1. Initialize the embedding matrix as a numpy array of zeros.
    • The embedding matrix has a row for each unique word in the vocabulary.
      • There is one additional row to handle "unknown" words.
      • So vocab_len is the number of unique words plus one.
    • Each row will store the vector representation of one word.
      • For example, one row may be 50 positions long if using GloVe word vectors.
    • In the code below, emb_dim represents the length of a word embedding.
  2. Fill in each row of the embedding matrix with the vector representation of a word
    • Each word in word_to_index is a string.
    • word_to_vec_map is a dictionary where the keys are strings and the values are the word vectors.
  3. Define the Keras embedding layer.
    • Use Embedding().
    • The input dimension is equal to the vocabulary length (number of unique words plus one).
    • The output dimension is equal to the number of positions in a word embedding.
    • Make this layer's embeddings fixed.
      • If you were to set trainable = True, then it will allow the optimization algorithm to modify the values of the word embeddings.
      • In this case, we don't want the model to modify the word embeddings.
  4. Set the embedding weights to be equal to the embedding matrix.
    • Note that this is part of the code is already completed for you and does not need to be modified.

Expected Output:

weights[0][1][3] = -0.3403

2.3 Building the Emojifier-V2

Lets now build the Emojifier-V2 model.


**Figure 3**: Emojifier-v2. A 2-layer LSTM sequence classifier.

Exercise: Implement Emojify_V2(), which builds a Keras graph of the architecture shown in Figure 3.

Additional Hints

# How to use Keras layers in two lines of code
dense_object = Dense(units = ...)
X = dense_object(inputs)

# How to use Keras layers in one line of code
X = Dense(units = ...)(inputs)

Run the following cell to create your model and check its summary. Because all sentences in the dataset are less than 10 words, we chose max_len = 10. You should see your architecture, it uses "20,223,927" parameters, of which 20,000,050 (the word embeddings) are non-trainable, and the remaining 223,877 are. Because our vocabulary size has 400,001 words (with valid indices from 0 to 400,000) there are 400,001*50 = 20,000,050 non-trainable parameters.

As usual, after creating your model in Keras, you need to compile it and define what loss, optimizer and metrics your are want to use. Compile your model using categorical_crossentropy loss, adam optimizer and ['accuracy'] metrics:

It's time to train your model. Your Emojifier-V2 model takes as input an array of shape (m, max_len) and outputs probability vectors of shape (m, number of classes). We thus have to convert X_train (array of sentences as strings) to X_train_indices (array of sentences as list of word indices), and Y_train (labels as indices) to Y_train_oh (labels as one-hot vectors).

Fit the Keras model on X_train_indices and Y_train_oh. We will use epochs = 50 and batch_size = 32.

Your model should perform around 90% to 100% accuracy on the training set. The exact accuracy you get may be a little different. Run the following cell to evaluate your model on the test set.

You should get a test accuracy between 80% and 95%. Run the cell below to see the mislabelled examples.

Now you can try it on your own example. Write your own sentence below.

LSTM version accounts for word order

Congratulations!

You have completed this notebook! ❤️❤️❤️

What you should remember

Input sentences:

"Congratulations on finishing this assignment and building an Emojifier."
"We hope you're happy with what you've accomplished in this notebook!"

Output emojis:

😀😀😀😀😀😀

Acknowledgments

Thanks to Alison Darcy and the Woebot team for their advice on the creation of this assignment.